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Table SI: Summary statistics for validated epistatic SNP pairs (see attached Excel Spreadsheet). 
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Dataset 


Two SNPs 


One SNP 


Both SNPs 




:„, :,|,, Mil/" 


lnsicie ivixic 


oursiae ivixic 


UK1 


5,930 (5,359) 


0(0) 


1(0) 


UK2 


99,205 (22,028) 


0(0) 


22 (0) 


FIN 


22,699 (17,065) 


0(0) 


5(0) 


NL 


2,227 (2,224) 


0(0) 


0(0) 


IT 


883 (395) 


0(0) 


6(0) 


Unique 


126,462 (20,542) 


0(0) 


34 (0) 


pairs 









Table S2: Summary of number of SNP pairs detected and number that appear as significant in at 
least one other cohort (in brackets) separated into pairs where both SNPs are inside the MHC 
region, pairs with one SNP inside the MHC region and one outside and pairs where both SNPs are 
outside the MHC region. 
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Single SNPs Combined (single SNPs + pairs) Validated epistatic pairs 







Var. 
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Exp. 




Exp. 




Exp. 




Cross 


UK1+UK2 


0.320 


0.879 [0.878,0.879] 


0.335 


0.885 [0.885,0.886] 


0.317 


0.878 [0.877, 0.878] 


validation 
















External 


Finn 


0.353 


0.892 [0.879, 0.906] 


0.368 


0.898 [0.885,0.911] 


0.347 


0.890 [0.876, 0.904] 


validation 


















IT 


0.288 


0.864 [0.843, 0.886] 


0.309 


0.874 [0.853, 0.895] 


0.288 


0.864 [0.842, 0.887] 




NL 


0.298 


0.869 [0.852, 0.886] 


0.298 


0.869 [0.852,0.886] 


0.291 


0.866 [0.848, 0.884] 



Table S3: Predictive power and disease variance explained by models with additive and epistatic 
genetic effects, trained on a combined UK1 + UK2 dataset. Predictive power of single SNPS and 
pairs in cross-validation and in external validation, using SparSNP models. Models were optimized 
on the combined UK1 + UK2 dataset (n=7,786 samples) in cross-validation (290K SNPs, 5359 
pairs encoded as 48,231 indicator variables, or both), and tested without modification on the other 
datasets. The 5,359 pairs were based on the UK1 dataset. The proportion of disease variance 
explained assumes a population prevalence of 1%. The 95% CI for AUC in UK1+UK2 was 
computed over the 10x10 cross-validation, and in external validation was computed using 
DeLong's method (R package pROC). 
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Figure SI: Study workflow 
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Figure S2: Cumulative distribution of LD between epistatic pairs in the UK1 cohort pairs. LD was 
measured by phasing the data using SHAPEIT [1], and calculating LD directly on control samples 
only. 
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UK1 - Univariate yf 
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Figure S3: Manhattan plots of the MHC for association of single SNPs (top panel) compared to 
association of epistatic pairs (bottom panel). The top panel shows the strength of association with 
celiac disease in the UK1 dataset using the -logl0(.P) from a chi-squared test. The bottom panel 
shows the epistatic association of pairs which achieved Bonferroni-adjusted significant according to 
the GSS statistic. For each pair, we plot two points showing the location of the two constituent 
SNPs. The SNPs in the top 25 strongest epistatic pairs have been marked in orange in both plots. 
Vertical green and grey lines indicate selected genes with the width denoting gene size. 



7 



•St 

a> 

CL 

'o 

CL 
CO 
X 



CO 
Q. 

Q_ 

CO 



Disease status 

DQA1_0501 
DQB1JJ201 
DQB1J301 
DQB1 0202 
DQA1_0201 
DQB1J302 
DQA1J301 
DQA1J505 
TS2260000, TS805262 
rs9268542, rs2856997 
rs2647050, rs2856705 
rs3948793, rs2854028 
rs31 1 7098, rs9275390 
rs3130931, rs3828903 
TS241424. rs2071543 
TS3095352, rs2250264 
rs2071556, rs209474 
rs2256965, rs3830041 
rs2070600, rs31 29871 
rs1 062470, rs3130712 
TS3871466, rs2269425 
rs1 2660382, rs2071596 
rs2535319, rs241437 
TS3129274, rs213212 
rs29232, rs7776082 
TS6900224, rs1 2525342 
rs2451741, rs2494711 
rs6456785, rs6918131 




800 1000 1200 

SamDles 



Figure S4: Balanced sample penetrance for 20 independent epistatic pairs (at a Q threshold of <0.3) 
and eight CD risk haplotypes from the UK1 dataset. Balanced sample penetrance implies risk of a 
given genotype being a case with red being high risk and green being low risk (at the top, disease 
status gives the true sample labelling). 
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